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Abstract 

We show that a dynamic logit model for binary panel data allowing for state dependence 
and unobserved heterogeneity may be accurately approximated by a quadratic exponen- 
tial model, the parameters of which have the same interpretation that they have in the 
true model. We also show how we can eliminate the parameters for the unobserved het- 
erogeneity from the approximating model by conditioning on the total scores, i.e. sum 
of the response variables for any individual in the panel. This allows to construct an 
approximate conditional likelihood for the dynamic logit model, by maximizing which 
we can estimate the parameters for the covariates and the state dependence. This esti- 
mator is very simple to compute and, by means of a simulation study, we show that it 
is competitive in terms of efficiency with the estimator of Honore & Kyriazidou (2000). 
Finally, we outline the extension of the proposed approach to the case of more elaborated 
structures for the state dependence and to that of categorical response variables with 
more than two levels. 
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1 Introduction 



An important issue in the econometric literature is the investigation of the so-called state 
dependence, i.e. how the experience of an event in the past can influence the occurrence 
of the same event in the future (see Heckman, 1981a, 1981b). This phenomenon arises in 
many economic applications, such as job decision, investment choice and brand choice. A 
correct analysis of this phenomenon should take into account the unobserved heterogeneity 
between individuals for what concerns the propensity to experience a certain outcome in all 
periods. The latter gives rise to a spurious state dependence that, as underlined by Heckman, 
is important to disentangle from the true state dependence in the analysis of a panel data set, 
as it can determine, for instance, different policy implications. 

In the case of binary response variables, panel data are usually analyzed through a dynamic 
logit or probit model which includes, among the explanatory variables, the lagged response 
variable (true state dependence) and has an individual-specific intercept (unobserved hetero- 
geneity); see Hsiao (1986) and Arellano & Honore (2001), among others. When the latter is 
considered as a fixed parameter, the approach suffers from the so-called incidental parame- 
ter problem (Neyman & Scott, 1948), which leads to inconsistent estimates of the structural 
parameters for the covariates and the true state dependence. For this reason, the individual 
specific intercept is frequently considered as a random parameter (see, for instance, Hyslop, 
1999). This requires the formulation of a certain distribution for this parameter, the depen- 
dence of which on the covariates has to be suitably modelled. In this case, the problem of the 
specification of the initial conditions of the dynamic panel process also arises and the estima- 
tion of the resulting model usually involves multiple integrals which may be cumbersome to 
compute. 

When a logit model is assumed, an alternative approach for eliminating the dependence of 
the joint distribution of the response variables on the incidental parameters is by condition- 
ing on suitable statistics. In particular, when the lagged response variable is omitted from 
the model, and therefore true state dependence is not considered, obvious statistics on which 
conditioning are the sums of the response variables at individual level. These are sufficient 



statistics for the incidental parameters, which, using a terminology derived from Rasch (1961), 
will be referred to as total scores. The resulting maximum likelihood estimator of the other pa- 
rameters may be computed by means of a simple Newton- Raphson algorithm and has optimal 
asymptotic properties (see Andersen, 1970, 1972). A conditional likelihood approach can also 
be followed when the assumed logit model includes the lagged response variable. In particu- 
lar, by exploiting an intuition of Chamberlain (1985), Honore & Kyriazidou (2000) proposed 
a weighted conditional likelihood that may be used to consistently estimate the structural 
parameters. The statistics on which conditioning are different from the total scores and are 
such that a larger number of response configurations does not contribute to the likelihood. 
Moreover, the approach requires the specification of a suitable kernel function for weighting 
the response configuration of any subject on the basis of the covariates. 

In this paper, we propose a conditional approach for estimating the parameters of a dy- 
namic logit model for binary panel data which is based on the approximation of the model 
through a particular quadratic exponential model (Cox, 1972). This approximation is found by 
following a method similar to that adopted by Cox & Wermuth (1994) in a different context. 
The approximating model is in practice a log-linear model for the conditional distribution of 
the response variables given the initial observation and the covariates. The main effects of this 
model depend on the covariates and on an individual-specific parameter for the unobserved 
heterogeneity, while the two-way interaction effects are equal to a common parameter when 
they are referred to a pair of consecutive response variables and to otherwise. We show that 
this interaction parameter has the same interpretation as in the dynamic logit model in terms 
of log- odds ratio, a measure of association between binary variables which is well known in the 
statistical literature on categorical data analysis (Agresti, 2002, Ch. 8). 

An interesting feature of the approximating model is that the parameters for the unob- 
served heterogeneity may be eliminated by conditioning on the total scores. This allows to 
construct an approximate conditional likelihood for the dynamic logit model, by maximiz- 
ing which we obtain an estimator of the structural parameters. This estimator is simple to 
compute as the one used in absence of state dependence and does not require to formulate 
a weighting function as the estimator of Honore & Kyriazidou (2000) does. The asymptotic 



properties of this estimator, when the approximating model holds, are proved on the basis 
of standard inferential results (Newey and McFadden, 1994). Under the true model, instead, 
they are studied by means of a simulation study performed along the same lines as Honore 
& Kyriazidou (2000). These simulations show that the proposed estimator is usually more 
efficient than their estimator. This is mainly due to the fact that our approach is based on a 
likelihood to which a larger number of response configurations contribute with respect to the 
likehhood on which their estimator is based. We also outhne the extension of the proposed 
approach to the case in which the logit model includes a second-order lagged response variable 
and to that of categorical response variables with more than two levels. 

The paper is organized as follows. In the next section we briefly review the dynamic logit 
model for binary panel data and describe the weighted conditional likelihood approach of 
Honore & Kyriazidou (2000); we consider this as a benchmark approach for the estimation 
of the model at issue. The proposed approximating model is described in Section 3, where 
its conditional distribution given the total scores is also derived. The resulting conditional 
maximum likelihood estimator is described in Section 4, where the asymptotic properties of this 
estimator under the approximating model are also illustrated. The results of the simulation 
study are shown in Section 5. Finally, in Section 6 we outline some possible extensions of the 
proposed approach and in Section 7 we draw the main conclusions. 

All the algorithms described in this paper have been implemented in Matlab functions 
which are available at the webpage www.stat.unipg.it/~bart. 

2 Dynamic logit models for binary panel data 

In the following, we first review the dynamic logit model for binary panel data and then we 
discuss conditional maximum likelihood estimation of its structural parameters. 

2.1 Basic assumptions 

Let Hit be a binary random variable equal to 1 if the subject i (i = 1, . . . , n) in the panel makes 
a certain choice at time t (t — 1, . . . ,T) and to otherwise; also let Xu be a corresponding 



vector of strictly exogenous covariates of size k. The standard econometric model for variables 
of this type assumes that 

yit^l{ai + xltf3 + yi,t-il + £it> 0}, i^l,...,n, t ^ 1, . . . ,T, (1) 

where !{•} is the indicator function, ctj is a fixed or random individual-specific parameter, 
the zero-mean random variables eu represent error terms and the initial observations yio are 
assumed to be exogenous. Moreover, /3 is a vector of parameters for the covariates and 7 
is a parameter measuring the state dependence effect. The interest is mostly on the last 
two. These will be referred to as structural parameters and, in the following, will be jointly 
denoted hy — {(3', 7)'. The parameters ai are instead considered as incidental parameters, 
the estimation of which is of minor interest. 

The typical assumption when the incidental parameters are treated as fixed parameters 
is that the errors terms eu are independent and identically distributed conditionally on the 
covariates, and with standard logistic distribution. Therefore, for any subject i, the conditional 
distribution of yu given ai, Xi = ( Xa ■ ■ ■ XiT ) and yio, . . . , yi^t-i may be expressed as 



p{yit\ai, Xi, yio, yi,t-i) = p{yiMi, x^, yi,t-i) = 

exp[yit{ai + x'itf3 + yi,t-il)] 



t^l,...,T. (2) 



1 + exp(ai + x'i^(3 + yi^t-il) 

This is a dynamic logit formulation which implies the following conditional distribution of the 

overall vector of response variables ?/j = {yn, . . . , yir) given a,, Xi and yio: 

f I Y \ cxp{yi+ai + ^^yitx[tf3 + yi^^) 
p{y,\a,^, Xi, yio) = , , ^ , ^, (3) 

where y,+ = Y,t Vn Vix = Vht-iyit, with the product Ht and the sum ranging over 
t^l,...,T. 

For what follows, it is important to note some features of the dependence structure between 
the response variables in ?/,,■, given «j, Xi and yio, implied by the model above. First of all we 
have that, for t — 1, ... ,T — 1, y^ is conditionally independent of any other response variable 
given yi^t-i a-nd yi,t+i- Moreover, since for i = 1, . . . , T we have that 

pjUit = 0\ai,Xit,yi^t-i = 0)p{yit = l\ai,Xit,yi,t-i = 1) ^ exp(ctj + a;^^/3 + 7) ^ 
p{yit = Q\ai, Xit, yi,t-i = l)p{yit = M^^h Xn, yi,t-i = 0) exp(ai + x'^^fi) 



the parameter 7 for the state dependence is nothing else than the log-odds ratio between any 
pair of variables {yi^t-i,yit), conditionally on all the other response variables or marginally 
with respect to these variables. 



2.2 Conditional inference 

As mentioned in Section 1, an interesting approach for estimating the fixed effect model 
illustrated above is based on the maximization of the conditional likelihood given suitable 
statistics. For the case in which the model includes the lagged response variable, one of the 
first authors to deal with this approach was Chamberlain (1985). In particular, he noticed 
that when T = 3 and the covariates are omitted from the model, so that 

P{yit\ai, Vio: . . . , yi,t-i) = p{yit\ai, yi,t-i) = :— 7 ^ r, i = 1, . . . , T, 

1 + exp(Q;j + yi,t-iV 

then p{y^\ai,yio,yii + yi2 = l-yis) docs not depend on for any yiQ and yi^. On the basis of 
this conditional distribution it is therefore possible to construct a likelihood which depends on 
the response configurations of only certain subjects (those for which yn + yi2 — 1) and which 
allows to consistently estimate the parameter 7. 

The conditional approach above was extended by Honore & Kyriazidou (2000) to the case 
where, as in (2), the model includes exogenous covariates. In particular, they noticed that 
p{yi\ai,Xi,yio,yii + yi2 = 1,^/^3) is independent of ai provided that Xi2 = cCjs. When this 
happens with positive probability, we can therefore estimate the structural parameters by 
maximizing a conditional likelihood whose logarithm may be expressed as 

X] Hyn + yi2 = l}l{xi2 - Xi3 = 0} log[p(t/j|ai, Xi, yio, yu + yi2 = 1, yis)]- 

i 

For the case in which p{xi2 = Xi^) = 0, which typically occurs in the presence of continuous 
covariates, Honore & Kyriazidou (2000) proposed to estimate 6 by maximizing a weighted 
conditional hkelihood defined as above, with the exception that I{xi2 — Xis — 0} is substituted 
by a Kernel density function K(-). The logarithm of this likelihood is 



^ Hya + yi2 = 1}K ( — — — j log[p(2/i|ai, X^, y^, y^ + yi2 = 1, l/is)], 



(4) 



with the bandwidth cr„ a priori fixed. Note that the weight given to the response configuration 
of the subject i decreases with the distance between Xi2 and Xis, while a large weight is given 
to the response configuration of this subject when Xi2 is close to cCjs and so the property of 
independence of p{y^\ai, Xi, i/io, ya + yi2 — 1, Vis) from ctj approximately holds. 

Honore & Kyriazidou (2000) also shown how the weighted conditional approach may be 
used in the case of T > 3. In this case, the approach is based on a pairwise weighted likelihood 
whose logarithm is given by the sum, for any pair of response variables (y^s, i/it), 1 < s < t < T, 
of an expression similar to (4) referred to this pair of variables. They also dealt with dynamic 
logit models including more than one lagged response variables and multinomial logit models 
for response variables having more than two levels and suggested a version of the Manski (1987) 
conditional maximum score estimator which does not require to formulate any distribution for 
the error terms. 

Although the weighted conditional estimator of Honore & Kyriazidou (2000) is of great 
interest, its use requires careful choice of the kernel function and of its bandwidth. This choice 
obviously affects the performance of the estimator. Moreover, since only certain response 
configurations are considered (e.g. those for which yn + yi2 — 1 and Xi2 near to Xis in the 
binary case with T = 3), the actual sample size, i.e. the number of response configurations 
which contribute to the likelihood, is usually much smaller than the nominal sample size 
n. This may obviously limit the efficiency of the estimator. Moreover, Honore & Kyriazidou 
(2000) referred of some problem of applicability of their approach in presence of time dummies. 

3 Proposed approximation 

In this section, we introduce a quadratic exponential model for binary panel data that ap- 
proximates the dynamic logit model illustrated above and we discuss its main features in 
comparison to the true model. 



3.1 Approximating quadratic exponential model 

Along the same lines followed by Cox & Wermuth (1994) in a different context, we first take 
the logarithm of p{yj^\ai, Xi, i/io) as defined in (3), i.e. 

log[p(yi|ai, Xi, yio)] = Vi+ai + ^ yux'^fS + yi^^t - ^ log[l + exp(ai + x'^fi + yi,t-il)]- (5) 

t t 

We then approximate the component which is not linear in the parameter on the basis of a 
first-order Taylor series expansion around ai — 0, ^ — and 7 = obtaining 

J2 log[l + exp(ai + </3 + yi,t-ij)] « ^[log(2) + O.Sa^ + 0.5x'^(3] + 0.5^/^,7, (6) 

t t 

with yi^ = Y,t yi,t-i = Vio + yi+ - ViT- 

Note that the first term at rhs of the expression above is constant with respect to y^, there- 
fore, by substituting (6) in (5) and renormalizing the exponential of the resulting expression 
we obtain the approximation 

where the sum at the denominator ranges over all the binary vectors z = {zi, . . . , zx) of 
dimension T and z^, 2;* and Zx are defined in an obvious way with zq = yiQ. The approximating 
model is therefore a quadratic exponential model for binary variables (Cox, 1972), in which 
the main effect for yit is equal to ctj -|- x^^^ — O.57 when t = 1, . . . , T — 1 and to -|- x[^/3 when 
t — T and the two-way interaction effect for {yis,yit) is equal to 7 when t — s + 1 and to 
otherwise. 

The above expression closely resembles (3), the main difference being in the denominator 
which in (7) does not depend on and it is simply a normalizing constant that may be 
denoted by 11^. The strong connection between the two models is clarified by the following 
Theorem, the proof of which is given in Appendix. 

Theorem 1 Fori = 1, . . . ,n, the quadratic exponential model (7) implies that the conditional 
logit ofyu, given ai, X^ and y^o, ■ ■ ■ , yi,t-i, is equal to 

1 r- v^». -1^,,--,,^,,, . . . _ ai + x'i^f3 + Vi.t-a + log - 0.57 if t <T 

_ nu. V _ ~ ^ ~ S 9i,t+i[^) 

oti + a; -4/3 + yi,t-il if t = T, 

(8) 



with git{z) denoting a function depending on the data only through £Ci,t+i, . . . , Xi^T o,nd such 
that \og[git{l) / git{0)] ~ 0.57; ^ — 2, ...,7", where the approximation is in the sense defined 
above. 

For i — 1, . . . ,n, model (7) also implies that: 

(i) ya is conditional independent ofyio,..., yi^t-2 given ai, Xi, yiQ and yi^t-i = 2, . . . , Tj; 

(a) yu is conditional independent on yio, . . . ,yi^t-2,yi,t+2, ■ ■ ■ ,yiT, given ai, Xi, yio and 
yi,t-i,yi,t+i (t^2,...,T -1). 



Note that, for t = T, logit (8) has exactly the same parametrization that it has un- 
der the dynamic logit model (2). When t < T, this equivalence holds approximately since 
log[git{l) / git{0)] ~ 0.57. The above Theorem also implies that 



p*iyit 


= l\ai, Xi,yi^t-i 


= 1) 


p*{yit 


= 0\ai,Xi,yi^t-i 


= 1) 



p*iyit 


= 1 «i, Xi,yi^t-i 


= 0) 


p*{yit 


= 0\ai,Xi,yi^t-i 


= 0) 



g-:? 7^^ ^ TT~log^7 ^ ^=7, ^ = l,...,r^, t 



and then, under the approximating model, 7 has the same interpretation that it has under the 
true model, i.e. log-odds ratio between any consecutive pair of response variables, conditionally 
on all the other response variables or marginally with respect to these variables. Moreover, 
the approximating model reproduces the same conditional independence relations between the 
response variables (see (i) and (ii) above) of the dynamic logit model. 

3.2 Conditional approximating model 

The main advantage of the above approximating model with respect to the true one is in the 
availability of minimal sufficient statistics for the heterogeneity parameters ccj. These statistics 
are i = 1, . . . , n, which will be referred to as total scores. As we show below, in fact, the 
conditional distribution of given X,, y^Q and yj+ does not depend on for any i. 
First of all note that, under the approximating model, 

p*(2/i+|ai,Xi,|/io) = p*{z\ai,Xi,yio) = ^^Pfa+Q:») Q:^-^i\^ztx'^^(3-{).bz^'^+z^-f) 

— Llit — ' 

Z:z+=yi+ Z:z+=yi+ t 



where the sum is extended to all the binary vectors z such that 2;+ = Then, after some 
algebra, the conditional distribution at issue becomes 

' " " P*{yi+WhXi,yiQ) Ez:z+=j,i+exp(Xlt2;tirit/3-0.52;*7 + 2;x7)' 

The expression above does not depend on a, and therefore may also be denoted by p*(r/j|-X'j, yjo)- 
The same happens for the elements of (3 corresponding to covariates which are time- invariant. 
To make this more clear, consider that we can multiply the numerator and the denominator 
of (9) by exp(|/j_|_cc^j^/3) and, after rearranging terms, obtain 

P {yi\Xi,yio,yi+) =p {yi\Di,yio,yi+) = ^ -^-5 — — ^ r, (10) 

with dit = Xit — Xii and Di = {di2 ■ ■ ■ dix ). We consequently assume that (3 does not 
include the intercept and parameters for the covariates which are time-invariant because these 
parameters are not identified. The same happens for the approach of Honore & Kyriazidou 
(2000). 

In Section 4.1 we will show how the structural parameters in 6 may be estimated by 
maximizing a conditional likelihood constructed on the basis of (10). 

3.3 Improving the approximation 

The quality of approximation (7) depends on the distance of the parameters from since it 
is based on the Taylor series expansion around ccj = 0, /3 = and 7 = which is reported 
in (6). Obviously, when one or more of these parameters are far from 0, the quality of the 
approximation may considerably be improved by choosing another point of the parameter 
space around which performing the Taylor series expansion. 

Consider, in particular, the following expansion around ctj = 0, /3 = y9 and 7 = 0: 

^log[l+exp(ai-fa;^j/3+l/i,t_i7)] ^^\og{l+e^^{x'^j3)+qit[ai + x'^^{(3-'^)]} + ^qityi,t-il, 
t t t 

where ^ is any fixed value of /3 and 



The latter is equal to the probability that yit — 1 when the parameters are fixed as above. 
This expansion is equal to a component independent of t/^ plus Quy^t-il ^-nd so, along the 
same lines as in Section 3.1, it results in following approximating model 

l^z exp(z+Q;j + ztx^t^ - QuZt-i^ + ^xT) 
This is a quadratic exponential model which closely resembles the initial approximating model 
(7), also in terms of dependence structure between the response variables and interpretation 
of the parameters, and such that the total score yj+ is still a sufficient statistic for CKj. We in 
fact have that 

which may also be expressed as 



t/ in ^ ew(J2t>iyitditf^-J2tiityi,t^ii + yixi) 

P^[yi\Di,yio,yi+) = -zr^ — r- 



(13) 



On the basis of this distribution, we develop a conditional likelihood, by maximizing which 
we obtain an estimator of which should be more efficient than that based on the conditional 
distribution of the initial approximating model, provided that y9 is suitably chosen. This 
estimator will be illustrated in Section 4.3. 

A natural question that rises at this point is why we still rely on an expansion around a 
point of the parameter space at which ai — and 7 = 0, instead of considering a generic 
point of type ai — ai, /3 — ^, ^ — ^. The first reason for doing this is that, since within our 
approach we do not estimate the parameters a^, which are ruled out by conditioning on the 
total scores, we have no way to choose the aj's in practical applications. We could use another 
estimation method to do this, but this would complicate considerably the proposed approach. 
Moreover, an expansion around 7 = 7 results in a model that, though rather similar to (12), 
has sufficient statistics for the incidental parameters ai which differ from the total scores. On 
the other hand, a series of simulations, the results of which are illustrated in Section 5, have 
shown that the estimator of obtained by maximizing the conditional likelihood based on 
(13) performs considerably better than that obtained by maximizing the conditional likelihood 
based on (10). In particular, this estimator have a surprisingly low bias even though samples 



are generated from a dynamic logit model of type (2) in which most of the parameters ctj 
and/or 7 are far from 0. 



4 Approximate conditional inference 

On the basis of distribution (10), we can derive an approximate conditional likelihood for the 
dynamic logit model that, for an observed sample (Xj, yio, yj, i = 1, . . . , n, has logarithm 



and obviously does not depend on the heterogeneity parameters a^. Since log[p*{y^\Di, yio, yi+)] 
is always equal to when y^^ = or y^^ — T, the response configurations for which this hap- 
pens do not contribute to (14). An equivalent expression for i*{0) is then 



The actual sample size is then smaller than the nominal one, but it is always larger than that 
we have in the approach of Honore & Kyriazidou (2000), which is based on a log-likelihood 
of type (4). With T — 3, for instance, the response configurations omitted from (15) are 
(0, 0, 0) and (1, 1, 1), whereas also the response configurations (0, 0, 1) and (1, 1, 0) are omitted 
from (4). 

In the following, we show how it is possible to estimate by maximizing £*{0) and we 
study the properties of the resulting estimator under the approximating model and then, by 
simulation, under the true model. 

4.1 Computing the approximate conditional maximum likelihood 
estimator 

First of all note that distribution (10) may be expressed in the canonical exponential family 
form as 




(14) 



t{d) = J]l{0 < y^+ < T}log\p*{y,\D„y,o,y^+)]. 



(15) 



P*{yi\Di,yiQ,yi+) 



exp[u{Di,yio,yiyO] 
C{6, Di, yio, yi+) 




Z:z+=yi+ 



with u{Di, yio, y-) = (^^^^ y^d'^, -0.5t/i* + y^x)'- This imphes that 

log[p*(2/i| A, Vio, yi+)] = u{Di, yio, y^)'d - log[C(0, A, Z/io, 
has first derivative vector and second derivative matrix equal, respectively, to 
V6)log[p*(2/jA,yio,yj+)] = ■y(A,yio,2/J and V^^g) log[p*(yJ A, y^o, = -S{Di,yio,yi+), 

where ■u( A, Z/io, l/J = w( A, y*, yi)-m{Di, yio, yi+), and with m( A, y*, yi+) and Sf( A, Vio, yi+) 
denoting, respectively, the conditional expected value and the conditional variance of u{Di, yiQ, t/J 
given ctj, Di and yi+ under the approximating model. These are given by 

m{Di,yio,yi+) = ^ p*{z\Di,yio,yi+)u{Di,yio, z) 

Z:z+=yi+ 

S{Di,yio,yi+) = ^ p*{z\Di,yio,yi+)v{Di,yio, z)v{D,yio, z)' . 

Z:z+=yi+ 

Consequently, for the conditional log-likelihood (.*{d) defined in (14), we have score vector 

s(6>) = ^t;(A,yio,2/J (16) 

i 

and observed information matrix 

J(0) = 5^5(A,yio,yi+). (17) 

i 

Note that J{d) is always non-negative definite since it corresponds to the sum of a series 
of variance-covariance matrices and therefore t{0) is always concave. When the sample size 
is large enough, this matrix is almost surely positive definite (see the proof of Theorem 2). 
In practical application, we should therefore find that t{d) is also strictly concave and has a 
unique maximum corresponding to the conditional maximum likelihood estimate d — , 7)'. 
This estimate may be found by a simple Newton-Raphson algorithm. At the hih. step, this 
algorithm updates the estimate of at the previous step, 0^^~^\ as 

Since we also have that the parameter space is equal to ]R'^+^, this algorithm is very simple 
to implement and usually converges in a few steps to 0, regardless of the starting value 0^^\ 



4.2 Asymptotic properties under the approximating model 

Suppose that the individuals in the samples are independent of each other with ctj, Xj, 
and t/j drawn, for i = 1, . . . ,n, from the model 

fo(a, X, yo, y) = /o(q;, X, yo)p*Ma, X, yo), (18) 

where fo{a, X, yo) denotes the joint distribution of heterogeneity effect (which is not observed), 
covariates and initial observation and pl{y\a, X ,yQ) denotes the conditional distribution of 
the response variables under the approximating quadratic exponential model (7) when = do, 
with do denoting the true value of its structural parameters. 

Under very mild conditions on the distribution of the covariates, wc have that exists, is 
a -y/n-consistent estimator of do and has asymptotic Normal distribution as n — > oo. These 
results is stated more precisely in the following Theorem, where Eo{-) denote the expected 
value under the true model (18). As we show in Appendix, the Theorem may be proved on 
the basis of standard asymptotic results (see, for instance, Newey and McFadden, 1994). 

Theorem 2 Assume that the distribution fo{cK, X ,yo) is such that Eo{DD') exists and is of 
full rank, with D = — Xi ... Xt — Xi). Then, for T ^ 2, we have that: 

• (Existence) exists with probability approaching 1 as n ^ oo; 

• (Consistency) ^ 0o; 

• (Normality) y/E(0 - 6>o) ^ A^(0, Jq '), with Jq = Eo[S(D, yo, y+)]. 

On the basis of the maximum likelihood estimator 0, we can consistently estimate the 
matrix /q as 

J = 1 J(^) = - V S{Di, yio, yi+), 

where S{Di, yio, yi+) is the variance-covariance matrix of the ith score component, computed 
under the estimated model. The standard errors of the elements of are then estimated by 
the corresponding diagonal elements of {ni)~^ under squared root. This directly derives from 



Newey & McFadden (1994, Sec. 4.2). Note that nl is equal to J {6) and so it is obtained as 
a by-product from the Newton- Raphson algorithm described in Section 4.1. 

Because of the asymptotic normality of d, it is also possible to construct an approximate 
(1 — a) -level confidence interval for any parameter (3h in /3 and for 7 as follows: 

Phi" Za/2se0h) and 7T^a/2se(7), (19) 

where se denotes the standard error estimated as above and Za/2 is the 100(1— Q;/2)th percentile 
of the standard Normal distribution. 

We must again recall that the results above hold under the approximating quadratic ex- 
ponential model. Therefore, these results hold approximately under the dynamic logit model, 
with the quality of the approximation depending on the distance between the two models. 
To study more precisely these properties under the logit model, we performed a simulation 
study along the same hues as Honore & Kyriazidou (2000). The results of simulation study 
are illustrated in Section 5. 

4.3 Improved approximate conditional estimator 

Once an estimate of is obtained by maximizing the log-hkelihood £*{d), an improved 
estimate may be obtained by maximizing 

£t(0) = ^iog[pt(t/,|A,y^o,y^+)], 

i 

with {y^\Di,yiQ,yi^) denoting the approximating distribution derived in (13) with P = ^. 
We expect an improvement since distribution p'^{y^\ai, Xi,yiQ) should be a better approxi- 
mation of the true distribution p{y^\ai, Xi,yio) with respect to p*(?/j|Q;i, Xj, ^jo)- We recall 
that the main difference between the two approximating distributions is in the correction 
factor 0.5yj*7 that in p^{y^\ai, Xi,yiQ), and thus also in p'''(t/j|£>j, yjo, Z/i+), is substituted by 
Y^t^itVht-^^^ '^ith Qit defined in (11). 

Maximization oil\d) with respect to d may be performed on the basis of the same iterative 
algorithm outlined at the end of Section 4.1. The only difference is in the computation of the 
score vector and the information matrix which are still defined, respectively, as in (16) and 



(17), but with u{Di, t/jo, Vi) = (Et>i Vudu, - Et Qityi,t~i + Vix)'- Provided that the sample is 
large enough, also i^{d) is almost surely a strictly concave function of 6. This ensures that, 
in practical applications, the iterative algorithm converges very easily to the maximum of this 
function. 

In the algorithm above, the vector ^ used to compute the probabilities ^jO; 
of the approximating model is held fixed at any iteration. However, it may be reasonable to 
update /9 at any step of the algorithm with the estimate of (3 obtained at end of the previous 
step. This in practice means that the quantities qit are dynamic and not fixed. As we observed, 
also this algorithm usually converges very quickly. We denote the value of at convergence by 
6 — (y9 , 7)'. To understand if 6 represent a real improvement over as an estimator of 0, we 
compared the two estimators by simulation (see Section 5). Standard errors for the elements 
of 6 may be estimated on the basis {nl)~^, where nl is an estimate of the information matrix 
at which is directly produced by the above iterative algorithm. Prom these standard errors 
it is possible to construct approximate confidence intervals for as described in the previous 
section, i.e. 

PhT Za/2se{^h) and ^ T Za/2se{'=^). (20) 

5 Simulation study of the proposed estimators 

In this section, we illustrate a simulation study carried out to assess the finite sample prop- 
erties of the proposed estimators under the dynamic logit model (2). In order to give more 
comparability to our work with the previous literature, we decided to follow the same simu- 
lation design adopted by Honore & Kyriazidou (2000), to whom we refer for a more detailed 
description of this design. The results concern both the estimator 0, built on the basis of 
the initial approximation and described in Section 4.1 [basic conditional estimator, for short), 
and the estimator 0, built on the basis of the improved approximation and illustrated in Sec- 
tion 4.3 {improved conditional estimator, for short). These results also concern the confidence 
intervals that may be constructed, following (19) and (20), based around these estimators. 



5.1 Benchmark design 

Under the benchmark design of Honore & Kyriazidou (2000), samples of different dimension 
{n = 250, 500, 1000, 2000, 4000) are initially generated from a dynamic logit model for T = 3 
time occasions, with only one covariate and parameters P = 1 and 7 = 0.5. The covariate 
is generated by drawing any Xit {i = 1, . . . ,n, t = 0, . . . ,T) from a Normal distribution with 
mean and variance 7r^/3, while any ai {i — 1, ... ,n) is generated as (2^10 + X^t ^it)/{T+l). To 
study the sensitivity of the results on T and 7, Honore & Kyriazidou (2000) then considered 
a number of time occasions T equal to 7 and different values of 7 (0.25, 1, 2). 

Within our simulation study, we generated 1000 samples from any of the models described 
above and, for each sample, we estimated P and 7. For both parameters we also constructed 
a 95% and a 80% confidence interval. The results in terms of mean bias, root mean squared 
error (RMSE), median bias and median absolute error (MAE) of the estimators are displayed 
in Table 1 and 2. For any 7, these tables also show the ratio^ between the actual sample size 
and the nominal sample size n. The results, in terms, of actual coverage level of the confidence 
intervals are displayed in Table 3. 

For what concerns the bias of the basic estimator P, from Tables 1 and 2 we can see that 

this bias is always moderate when T — 3 and is neghgible when T — 7. For what concerns 

the efficiency of /3, we can note that both RMSE and MAE of this estimator decrease as n 

and T grow. In particular they decrease with n at a rate close to y/n and much faster with T. 

This depends on the fact that the number of observations that contribute to the approximate 

conditional likelihood increases more than proportionally with T because an increase of T also 

determines and increase of the actual sample size. Moreover, both RMSE and MAE increase 

with 7. This is mainly due to the fact that an increase of 7, when this is positive, implies 

a reduction of the actual sample size, while the approximation on which our approach is 

based becomes less sharp. A completely different scenario may be seen for the basic estimator 

7 which is always downward biased. Its bias is not negligible in most of the cases under 

consideration and tends to increase with 7 and, surprisingly, with n and T. The dependence 

on n is much stronger for T = 3 than for T — 7. This bias has obviously a negative effect on 
^It is computed as the expected proportion of response configuration such that < yi+ < T. 



Table 1: Performance of the basic and improved conditional estimators under some benchmark 
simulation designs with T = 3. Percentual numbers are referred to the ratio between the actual 
sample size and the nominal one. 



Estimation of (3 Estimation of 7 









Mean 




Median 




Mean 




Median 




7 


n 


Estimator 


Bias 


RMSE 


Bias 


MAE 


Bias 


RMSE 


Bias 


MAE 


0.25 


250 


Basic 


0.039 


0.144 


0.025 


0.110 


-0.033 


0.374 


-0.036 


0.299 


(60%) 




TmDrovprI 


0.026 


0.142 


0.010 


0.110 


-0.017 


0.360 


-0.029 


0.286 




500 




0.024 


0.096 


0.017 


0.075 


-0.038 


0.274 


-0.033 


0.221 






Improved 


0.010 


0.093 


0.003 


0.073 


-0.013 


0.265 


-0.012 


0.213 




1000 


Basic 


0.020 


0.069 


0.016 


0.054 


-0.034 


0.191 


-0.035 


0.156 






Improved 


0.005 


0.066 


0.002 


0.053 


-0.007 


0.183 


-0.011 


0.146 




2000 


Basic 


0.019 


0.048 


0.017 


0.038 


-0.040 


0.134 


-0.043 


0.108 






Improved 


0.004 


0.045 


0.002 


0.035 


-0.012 


0.125 


-0.011 


0.099 




4000 


Basic 


0.016 


0.036 


0.017 


0.029 


-0.040 


0.101 


-0.040 


0.081 






Improved 


0.001 


0.033 


0.001 


0.026 


-0.011 


0.090 


-0.011 


0.072 


0.5 


250 


Basic 


0.055 


0.155 


0.035 


0.116 


-0.067 


0.390 


-0.079 


0.313 


(57%) 




Tmnroved 


0.027 


0.146 


0.010 


0.111 


-0.026 


0.361 


-0.027 


0.285 




500 




0.036 


0.102 


0.030 


0.079 


-0.070 


0.288 


-0.064 


0.233 






Improved 


0.008 


0.094 


0.003 


0.074 


-0.021 


0.272 


-0.020 


0.219 




1000 


Basic 


0.033 


0.075 


0.029 


0.059 


-0.069 


0.208 


-0.072 


0.167 






Improved 


0.005 


0.066 


0.002 


0.053 


-0.017 


0.189 


-0.017 


0.148 




2000 


Basic 


0.031 


0.057 


0.028 


0.045 


-0.074 


0.152 


-0.078 


0.123 






Improved 


0.003 


0.047 


0.001 


0.037 


-0.020 


0.130 


-0.013 


0.103 




4000 


Basic 


0.028 


0.043 


0.028 


0.035 


-0.077 


0.122 


-0.077 


0.099 






Improved 


0.000 


0.033 


0.000 


0.027 


-0.023 


0.095 


-0.021 


0.077 


1 


250 


Basic 


0.081 


0.179 


0.062 


0.134 


-0.117 


0.443 


-0.120 


0.352 


(52%) 




Tm Droved 


0.029 


0.154 


0.012 


0.116 


-0.035 


0.405 


-0.039 


0.319 




500 


Basic 


0.060 


0.120 


0.055 


0.094 


-0.127 


0.333 


-0.134 


0.268 






Tmnrovfrl 


0.011 


0.101 


0.005 


0.079 


-0.038 


0.294 


-0.043 


0.234 




1000 


Basic 


0.053 


0.090 


0.050 


0.072 


-0.127 


0.249 


-0.132 


0.202 






Improved 


0.004 


0.070 


0.002 


0.056 


-0.034 


0.203 


-0.040 


0.161 




2000 


Basic 


0.050 


0.071 


0.046 


0.057 


-0.137 


0.200 


-0.143 


0.165 






Improved 


0.001 


0.048 


-0.003 


0.038 


-0.042 


0.146 


-0.043 


0.116 




4000 


Basic 


0.047 


0.059 


0.046 


0.050 


-0.140 


0.174 


-0.143 


0.149 






Improved 


-0.002 


0.034 


-0.002 


0.027 


-0.044 


0.110 


-0.045 


0.089 


2 


250 


Basic 


0.119 


0.234 


0.084 


0.169 


-0.144 


0.592 


-0.168 


0.471 


(42%) 




Improved 


0.040 


0.185 


0.015 


0.139 


-0.030 


0.526 


-0.056 


0.419 




500 


Basic 


0.086 


0.154 


0.070 


0.116 


-0.196 


0.423 


-0.216 


0.345 






Improved 


0.014 


0.119 


0.000 


0.092 


-0.060 


0.358 


-0.078 


0.286 




1000 


Basic 


0.070 


0.108 


0.065 


0.086 


-0.200 


0.326 


-0.200 


0.264 






Improved 


-0.003 


0.078 


-0.008 


0.062 


-0.073 


0.252 


-0.083 


0.200 




2000 


Basic 


0.065 


0.087 


0.062 


0.070 


-0.211 


0.279 


-0.213 


0.235 






Improved 


-0.007 


0.055 


-0.009 


0.044 


-0.078 


0.191 


-0.081 


0.154 




4000 


Basic 


0.066 


0.078 


0.066 


0.067 


-0.211 


0.247 


-0.217 


0.217 






Improved 


-0.006 


0.039 


-0.006 


0.031 


-0.079 


0.148 


-0.079 


0.120 



Table 2: Performance of the basic and improved conditional estimators under some benchmark 
simulation designs with T = 7. Percentual numbers are referred to the ratio between the actual 
sample size and the nominal one. 



Estimation of /3 Estimation of 7 









Mean 




Median 




Mean 




Median 




7 


n 


Estimator 


Bias 


RMSE 


Bias 


MAE 


Bias 


RMSE 


Bias 


MAE 


0.25 


250 


Basic 


0.011 


0.060 


0.008 


0.047 


-0.057 


0.151 


-0.058 


0.120 


(92%) 




Tm T^rnvpn 


0.006 


0.059 


0.003 


0.047 


-0.006 


0.152 


-0.011 


0.123 




500 


Rasir 


0.009 


0.043 


0.007 


0.034 


-0.056 


0.115 


-0.057 


0.092 






Improved 


0.003 


0.042 


0.002 


0.033 


-0.002 


0.110 


-0.002 


0.088 




1000 


Basic 


0.004 


0.030 


0.004 


0.024 


-0.056 


0.090 


-0.056 


0.074 






Improved 


-0.001 


0.030 


-0.001 


0.024 


-0.007 


0.079 


-0.007 


0.062 




2000 


Basic 


0.006 


0.022 


0.006 


0.018 


-0.057 


0.075 


-0.055 


0.063 






Improved 


0.001 


0.021 


0.000 


0.017 


-0.006 


0.052 


-0.006 


0.042 




4000 


Basic 


0.006 


0.016 


0.007 


0.013 


-0.056 


0.065 


-0.055 


0.057 






Improved 


0.000 


0.015 


0.001 


0.012 


-0.005 


0.038 


-0.006 


0.031 


0.5 


250 


Basic 


0.014 


0.063 


0.009 


0.049 


-0.111 


0.180 


-0.113 


0.147 


(91%) 




Tmnroved 


0.006 


0.061 


0.001 


0.049 


-0.008 


0.153 


-0.009 


0.124 




500 




0.012 


0.044 


0.009 


0.035 


-0.112 


0.151 


-0.115 


0.127 






Improved 


0.003 


0.042 


0.001 


0.034 


-0.007 


0.111 


-0.009 


0.089 




1000 


Basic 


0.007 


0.030 


0.007 


0.024 


-0.112 


0.133 


-0.113 


0.116 






Improved 


-0.001 


0.029 


-0.001 


0.024 


-0.012 


0.082 


-0.013 


0.066 




2000 


Basic 


0.009 


0.024 


0.009 


0.019 


-0.112 


0.122 


-0.110 


0.112 






Improved 


0.000 


0.022 


0.000 


0.017 


-0.010 


0.054 


-0.011 


0.043 




4000 


Basic 


0.009 


0.018 


0.009 


0.014 


-0.111 


0.116 


-0.110 


0.111 






Improved 


0.001 


0.015 


0.001 


0.012 


-0.009 


0.039 


-0.010 


0.032 


1 


250 


Basic 


0.012 


0.065 


0.007 


0.051 


-0.220 


0.264 


-0.227 


0.229 


(87%) 




Tm Droved 


0.006 


0.063 


0.002 


0.050 


-0.020 


0.157 


-0.018 


0.124 




500 


Basic 


0.010 


0.045 


0.009 


0.036 


-0.218 


0.243 


-0.218 


0.221 






Tmnrovfrl 


0.004 


0.044 


0.004 


0.035 


-0.015 


0.120 


-0.014 


0.095 




1000 


Basic 


0.006 


0.031 


0.005 


0.025 


-0.219 


0.232 


-0.221 


0.219 






Improved 


-0.001 


0.030 


-0.001 


0.024 


-0.021 


0.087 


-0.021 


0.069 




2000 


Basic 


0.007 


0.023 


0.005 


0.018 


-0.218 


0.224 


-0.217 


0.218 






Improved 


0.000 


0.022 


-0.001 


0.017 


-0.018 


0.059 


-0.018 


0.047 




4000 


Basic 


0.007 


0.017 


0.007 


0.014 


-0.219 


0.222 


-0.219 


0.219 






Improved 


0.000 


0.016 


0.000 


0.013 


-0.021 


0.045 


-0.021 


0.036 


2 


250 


Basic 


-0.017 


0.072 


-0.022 


0.058 


-0.423 


0.456 


-0.431 


0.425 


(76%) 




Improved 


0.007 


0.071 


0.001 


0.055 


-0.065 


0.191 


-0.072 


0.156 




500 


Basic 


-0.020 


0.052 


-0.023 


0.042 


-0.421 


0.439 


-0.423 


0.421 






Improved 


0.003 


0.049 


0.001 


0.039 


-0.058 


0.151 


-0.060 


0.122 




1000 


Basic 


-0.024 


0.041 


-0.024 


0.034 


-0.426 


0.435 


-0.426 


0.426 






Improved 


-0.001 


0.035 


-0.002 


0.028 


-0.064 


0.116 


-0.066 


0.095 




2000 


Basic 


-0.024 


0.034 


-0.025 


0.028 


-0.425 


0.430 


-0.424 


0.425 






Improved 


-0.001 


0.024 


-0.002 


0.019 


-0.064 


0.092 


-0.065 


0.077 




4000 


Basic 


-0.024 


0.030 


-0.024 


0.026 


-0.428 


0.430 


-0.428 


0.428 






Improved 


-0.001 


0.017 


-0.001 


0.014 


-0.066 


0.081 


-0.066 


0.069 



Table 3: Coverage levels of the confidence intervals based on the basic and improved condi- 



t ional estimators under some benchmark simulation designs. 







Method 




T = 


3 






T = 


7 




Interval for /? 


Interval for 7 


Interval for (3 


Interval for 7 




80'X 


oryx 


80%. 


95'X 


su'X 


93% 


<so% 


0.25 


250 




0.944 


0.802 


0.947 


0.812 


0.944 


0.801 


0.931 


0.754 






ImprovGci 


0.950 


0.808 


0.950 


0.802 


0.949 


0.804 


0.958 


0.797 




500 


Basic 


0.945 


0.814 


0.953 


0.798 


0.944 


0.788 


0.913 


0.738 






Improved 


0.955 


0.823 


0.955 


0.798 


0.944 


0.792 


0.952 


0.791 




1000 


Bcisic 


0.932 


0.791 


0.952 


0.794 


0.949 


0.800 


0.870 


0.663 






Ttd nrovpH 


0.950 


0.807 


0.944 


0.805 


0.956 


0.796 


0.953 


0.800 




2000 


Basic 


0.920 


0.765 


0.945 


0.765 


0.940 


0.789 


0.769 


0.548 






Improved 


0.946 


0.809 


0.956 


0.782 


0.948 


0.804 


0.955 


0.797 




4000 


Basic 


0.939 


0.751 


0.936 


0.754 


0.932 


0.763 


0.627 


0.380 






Improved 


0.955 


0.798 


0.950 


0.797 


0.955 


0.790 


0.947 


0.810 


0.5 


250 


Basic 


0.928 


0.802 


0.948 


0.798 


0.940 


0.798 


0.883 


0.658 






Improved 


0.946 


0.825 


0.943 


0.804 


0.953 


0.802 


0.951 


0.813 




500 


Basic 


0.937 


0.811 


0.951 


0.793 


0.934 


0.783 


0.813 


0.564 






Improved 


0.952 


0.814 


0.955 


0.793 


0.941 


0.800 


0.949 


0.801 




1000 


Basic 


0.916 


0.758 


0.946 


0.769 


0.953 


0.796 


0.658 


0.396 






Improved 


0.953 


0.787 


0.951 


0.810 


0.953 


0.802 


0.946 


0.787 




2000 


Basic 


0.896 


0.734 


0.913 


0.745 


0.928 


0.778 


0.380 


0.159 






Improved 


0.952 


0.799 


0.950 


0.781 


0.945 


0.801 


0.952 


0.810 




4000 


Basic 


0.878 


0.666 


0.867 


0.669 


0.918 


0.720 


0.131 


0.029 






Improved 


0.959 


0.798 


0.941 


0.782 


0.950 


0.802 


0.948 


0.797 


1 


250 


Basic 


0.919 


0.796 


0.941 


0.792 


0.941 


0.800 


0.675 


0.412 






Improved 


0.946 


0.827 


0.949 


0.799 


0.945 


0.794 


0.947 


0.798 




500 


Basic 


0.912 


0.763 


0.937 


0.763 


0.938 


0.800 


0.479 


0.226 






Improved 


0.946 


0.811 


0.948 


0.793 


0.946 


0.807 


0.950 


0.791 




1000 


Basic 


U.o / 


n 71 1 

U. / ii 


u.yio 


n 791 




n 81 1^ 

U.oiO 


nisi 

U.ioi 


U.UDZ 






Improved 


0.961 


0.793 


0.949 


0.802 


0.945 


0.808 


0.945 


0.789 




2000 


Basic 


0.833 


0.629 


0.847 


0.627 


0.941 


0.787 


0.009 


0.002 






Improved 


0.949 


0.819 


0.939 


0.791 


0.952 


0.799 


0.936 


0.771 




4000 


Basic 


0.746 


0.485 


0.729 


0.469 


0.938 


0.758 


0.000 


0.000 






Improved 


0.955 


0.788 


0.928 


0.757 


0.948 


0.808 


0.921 


0.751 


2 


250 


Basic 


0.903 


0.785 


0.948 


0.788 


0.940 


0.794 


0.286 


0.118 






Improved 


0.944 


0.830 


0.947 


0.825 


0.940 


0.833 


0.941 


0.755 




500 


Basic 


0.892 


0.752 


0.921 


0.737 


0.926 


0.771 


0.090 


0.021 






Improved 


0.952 


0.830 


0.946 


0.805 


0.946 


0.808 


0.931 


0.749 




1000 


Basic 


0.855 


0.697 


0.891 


0.675 


0.879 


0.687 


0.004 


0.001 






Improved 


0.956 


0.797 


0.937 


0.787 


0.946 


0.799 


0.896 


0.703 




2000 


Basic 


0.796 


0.607 


0.790 


0.542 


0.824 


0.592 


0.000 


0.000 






Improved 


0.961 


0.785 


0.929 


0.761 


0.955 


0.799 


0.840 


0.616 




4000 


Basic 


0.654 


0.380 


0.625 


0.357 


0.708 


0.441 


0.000 


0.000 






Improved 


0.948 


0.781 


0.902 


0.697 


0.948 


0.798 


0.686 


0.445 



the efficiency of tlie estimator. More precisely, botli RMSE and MAE decrease as n grows at 
a rate mucfi slower than ^/n, especially when T and 7 are large. With T — 7 and 7 = 2, for 
instance, the MAE of 7 is close to be constant with respect to n and may be larger than that 
for the case in which T — 3 and 7 = 2. 

For what concerns the improved estimators P and 7, Tables 1 and 2 show that these 
estimators perform, in terms of bias and efficiency, much better than the basic estimators 
illustrated above. In particular, (3 has a bias which is always negligible and its gain in terms of 
efficiency with respect to $ increases with n and 7 and does not seem to be strongly affected 
by T. With T = 3, for instance, P and (3 have the same MAE when n = 250 and 7 = 0.25, but 
the MAE of the first estimator is more than the double than that of the second estimator when 
n = 4000 and 7 = 2. The advantage of the improved estimator 7 over the basic estimator 7 
is also more evident. Even though 7 is downward biased, its bias is almost always moderate 
and seems to increase very slowly with n and 7 and to decrease as T grows. Moreover, both 
RMSE and MAE of 7 decrease as n grows at a rate close to y/n and much faster in T and 
increase with 7. The gain in the terms of efficiency of 7 over 7 increases with n, T and 7. 
When T — 3, n — 250 and 7 = 0.25, for instance, the median bias and the MAE of 7 are 
equal respectively to -0.029 and 0.286 whereas, for 7, they are equal respectively to -0.036 and 
0.299. When T = 7, n = 4000 and 7 = 2, instead, the median bias and the MAE of 7 are 
equal respectively to -0.066 and 0.069, whereas for 7 they are equal respectively to -0.428 and 
0.428. 

The superiority of the improved estimators over the basic estimators is confirmed by the 
behavior of the confidence intervals constructed around these estimators. In particular, as 
may be deduced from Table 3, the actual coverage level of the confidence intervals for (3 based 
on P (see (19)) tends to decrease with n and 7 and to increase with T. In practice, the actual 
coverage level is significantly smaller than the nominal level only when T = 3 and 7^1. The 
confidence intervals based on P (see (20)) behave even better, with an actual coverage level 
which is always very close to the nominal one. Similar conclusions may be drawn about the 
confidence intervals for 7. In this case however, the actual coverage level of the confidence 
interval based on 7 may be completely inadequate; this is mainly due to the bias of this 



estimator. We have a strong improvement with the confidence intervals based on 7, even 
though also the latter may not be width enough when 7 is large. With T — 1 , n — 1000 
and 7 = 2, for instance, the 95% confidence interval based on 7 has a coverage level of 0.004, 
whereas that of the confidence interval based on 7 is equal to 0.896. 

5.2 Other designs 

Following Honore & Kyriazidou (2000), we considered other simulation designs based on the 
same dynamic logit model used in the benchmark design with T = 3, 7 = 0.5 and j3 = 1. In 
particular, we considered the following designs: 

• x^(l) regressor. the only difference with respect to the benchmark design is that any xu 
{i = 1, . . . , n, t = 0, . . . ,T) is generated from a x^(l) distribution transformed to have 
mean and variance 7r^/3; 

• additional regressors: samples are generated as in the benchmark design, but three 
more covariates are used in the estimation of the parameters. These covariates, which 
obviously have no real effect on the response variables, are generated from the same 
Normal distribution used to generate Xu; 

• trending regressors, T = 3: the only difference with respect to the benchmark design is 
that the covariate is generated as x^ — (f){ip + O.lt + Qt), with and ip suitably chosen 
and where Qo, . . . XiT follow a Gaussian AR(1) process with autoregressive coefficient 
equal to 0.5, normalized to have variance 7r^/3; 

• trending regressors, T = 7: as in the previous design, but with T = 7. 

The results in terms mean bias, RMSE, median bias and MAE are displayed in Table 4, 
while the results in terms of actual coverage level of the confidence intervals are displayed 
in Table 5. Given their superiority over the basic estimators, the results concern only the 
improved estimators /3 and 7 and the confidence intervals based on these estimators. 

On the basis of the results in Table 4 we can conclude that the improved estimators have 
not a considerably different behavior with respect to the benchmark design. Even when the 



Table 4: Performance of the improved conditional estimator under different simulation designs. 

Percentual numbers are referred to the ratio between the actual sample size and the nominal 
one. 



Estimation of j3 Estimation of 7 



Type of design 


n 


Mean 
Bias 


RMSE 


Median 
Bias 


MAE 


Mean 
Bias 


RMSE 


Median 
Bias 


MAE 


regressors x (1) 


250 


0, 


.020 


0, 


.157 


0, 


.006 


0, 


.123 


-0, 


.020 


0, 


.326 


-0, 


.026 


0, 


.261 


(56%) 


500 


0, 


.007 


0, 


.106 


0, 


.002 


0, 


.084 


-0, 


.016 


0, 


.230 


-0, 


.017 


0, 


.184 




1000 


0, 


.002 


0, 


.073 


-0, 


,002 


0, 


,058 


-0, 


.031 


0, 


,163 


-0, 


,028 


0, 


,130 




9000 


-0, 


.001 


0. 


.052 


-0, 


.002 


0, 


.042 


-0, 


.024 


0, 


.113 


-0, 


.023 


0, 


.091 




4000 


0, 


.000 


0, 


.039 


-0, 


.001 


0, 


.031 


-0, 


.024 


0, 


.080 


-0, 


.022 


0, 


.063 


additional regressors 


250 


0, 


.052 


0, 


.155 


0, 


.041 


0, 


.118 


-0, 


.022 


0, 


.398 


-0, 


,039 


0, 


,320 


(57%) 


500 


0, 


.017 


0, 


.097 


0, 


,013 


0, 


,076 


-0, 


.015 


0, 


,257 


-0, 


,022 


0, 


,205 




1000 


0, 


.013 


0, 


.064 


0, 


,013 


0, 


,051 


-0, 


.033 


0, 


,182 


-0, 


,037 


0, 


,147 




AAA 
^000 


0, 


.003 


0, 


.048 


0, 


.001 


0, 


,038 


-0, 


.022 


0, 


.130 


-0, 


.022 


0, 


,104 




4000 


0, 


.003 


0, 


.032 


0, 


.001 


0, 


.026 


-0, 


.016 


0, 


.090 


-0, 


.011 


0, 


.072 


trending regressors, 


250 


0, 


.030 


0, 


.171 


0, 


.016 


0, 


.129 


-0, 


.029 


0, 


.417 


-0. 


.036 


0. 


.328 


T = 3 


500 


0, 


.013 


0, 


,117 


0, 


,001 


0, 


,092 


-0, 


.030 


0, 


,281 


-0, 


,028 


0, 


,225 


(42%) 


1000 


0, 


.002 


0, 


.080 


-0, 


.004 


0, 


.064 


-0, 


.019 


0, 


.198 


-0, 


.014 


0, 


.158 




2000 


0, 


.002 


0, 


.059 


0, 


.001 


0, 


.047 


-0, 


.034 


0, 


.145 


-0, 


.036 


0, 


.115 




4000 


-0, 


.001 


0. 


.039 


-0. 


.003 


0. 


.031 


-0, 


.024 


0, 


.100 


-0. 


.028 


0, 


.080 


trending regressors, 


250 


0, 


.009 


0, 


,072 


0, 


,004 


0, 


,056 


-0, 


.015 


0, 


.168 


-0, 


,018 


0, 


,135 


T = 7 


500 


0, 


.006 


0, 


.050 


0, 


.004 


0, 


.041 


-0, 


.013 


0, 


.122 


-0, 


.011 


0, 


.095 


(78%) 


1000 


0, 


.002 


0, 


.035 


0, 


.001 


0, 


.028 


-0, 


.015 


0, 


.087 


-0, 


.013 


0, 


.068 




2000 


0, 


.002 


0, 


,026 


0, 


,002 


0, 


,021 


-0, 


.014 


0, 


,060 


-0, 


,017 


0, 


,048 




4000 


0, 


.002 


0, 


.018 


0, 


.001 


0, 


.015 


-0, 


.015 


0, 


.044 


-0, 


.015 


0, 


.036 



estimators perform worse, in terms of bias and/or efficiency, with respect to the benchmark 
design, the difference is slight. This happens, for the x^(l) design (limited to for the 
additional regressors design when n is small and for the trending regressor design when T = 3. 
Occasionally, it also happens that the estimators perform better with respect to the benchmark 
design. Limited to 7, this happens, for instance, for the design. 

Finally, for what concerns the confidence intervals, we observed that actual coverage value 
is always very close to the nominal level for both parameters a and j3. This confirms the good 
quality of the method proposed in Section 4.3 for constructing confidence intervals, already 
noticed for the benchmark design. 



Table 5: Coverage levels of the confidence intervals based on the improved conditional esti- 
mator under different simulation designs. 







Interval for (3 


T 1 

Intcrv 


al lor 7 


Type of design 


n 


95% 


80% 


95% 


80% 


regressors X (1) 


250 


0.947 


0.815 


0.951 


0.803 




■lOO 


0.948 


0.821 


948 


7Q8 




1000 


0.960 


0.794 


0.940 


0.802 




2000 


0.960 


0.805 


0.947 


0.803 




4000 


0.952 


0.805 


0.934 


0.779 


additional regressors 


250 


0.941 


0.811 


0.955 


0.817 






0.942 


0.800 


Q4fi 


81 




1000 


0.945 


0.803 


0.945 


0.795 




2000 


0.950 


0.816 


0.951 


0.782 




4000 


0.946 


0.794 


0.956 


0.800 


trending regressors, 


250 


0.951 


0.826 


0.952 


0.813 


1=6 


OUU 


0.945 


0.820 


u.y4» 


U.oUl 




1000 


0.949 


0.796 


0.948 


0.798 




2000 


0.955 


0.805 


0.943 


0.789 




4000 


0.952 


0.793 


0.940 


0.786 


trending regressors, 


250 


0.940 


0.805 


0.949 


0.796 


T = 7 


500 


0.954 


0.799 


0.945 


0.815 




1000 


0.946 


0.808 


0.942 


0.800 




2000 


0.947 


0.798 


0.945 


0.801 




4000 


0.952 


0.801 


0.941 


0.785 



5.3 Comparison with the weighted conditional estimator 

An important issue is how the improved version of our approximate conditional estimator, 

which we established to be much better than its basic version, performs in comparison to 

the weighted conditional estimator of Honore & Kyriazidou (2000). We then compared their 

simulation results with the simulation results illustrated above. An advantage of our estimator 

over their estimator, in terms of bias and efficiency, seems clearly to emerge. The results of 

this comparison are summarized in Table 6, which, for certain reference situations and for 

both j3 and 7, shows the median bias and the MAE of our estimator in comparison to those 

of the weighted conditional estimator. For both estimators, the table also shows the rate^ 

between the actual sample size and the nominal sample size. 

From Table 6 we can see that, as regards the parameter the advantage of our estimator 
^For the weighted conditional estimator, this rate is computed as the expected proportion of pairs of 
response variables [yis, yu), < s < t < T, such that yis + yu = 1- 



Table 6: Comparison between the weighted and the improved conditional estimator. Per- 
centual numbers in the first two columns are referred to actual sample size under the two 
approaches. Percentual numbers in the other columns are referred to the reduction of median 
bias (in absolute value) and MAE from the first to the second estimator. 



Estimation of /? Estimation of 7 



7 


T 


n 


Estimator 


Median 
Bias 


MAE 


Median 
Bias 


MAE 


0.5 


3 


250 


Weighted 


0.076 


0.154 


-0.039 


0.403 


(37% - 


57%) 




Approximated 


0.010 


0.111 


-0.027 


0.285 










(87%) 


(28%) 


(31%) 


(29%) 






1000 


Weighted 


0.038 


0.086 


-0.035 


0.178 








Approximated 


0.002 


0.053 


-0.017 


0.148 










(95%) 


(38%) 


(51%) 


(17%) 






4000 


Weighted 


0.019 


0.044 


-0.035 


0.102 








Approximated 


0.000 


0.027 


-0.021 


0.077 










(100%) 


(39%) 


(40%) 


(25%) 




7 


250 


Weighted 


0.014 


0.050 


-0.053 


0.131 


(43% - 


91%) 




Approximated 


0.001 


0.049 


-0.009 


0.124 










(93%) 


(2%) 


(83%) 


(5%) 






1000 


Weighted 


0.009 


0.027 


-0.041 


0.075 








Approximated 


-0.001 


0.024 


-0.013 


0.066 










(89%) 


(11%) 


(68%) 


(12%) 






4000 


Weighted 


0.005 


0.015 


-0.033 


0.039 








Approximated 


0.001 


0.012 


-0.010 


0.032 










(80%) 


(20%) 


(70%) 


(18%) 


2 


3 


250 


Weighted 


0.196 


0.251 


-0.056 


0.620 


(26% - 


42%) 




Approximated 


0.015 


0.139 


-0.056 


0.419 










(92%) 


(45%) 


(0%) 


(32%) 






1000 


Weighted 


0.113 


0.136 


-0.148 


0.321 








Approximated 


-0.008 


0.062 


-0.083 


0.200 










(93%) 


(54%) 


(44%) 


(38%) 






4000 


Weighted 


0.063 


0.074 


-0.118 


0.163 








Approximated 


-0.006 


0.031 


-0.079 


0.120 










(90%) 


(58%) 


(33%) 


(26%) 




7 


250 


Weighted 


0.016 


0.064 


-0.195 


0.227 


(34% - 


76%) 




Approximated 


0.001 


0.055 


-0.072 


0.156 










(94%) 


(14%) 


(63%) 


(31%) 






1000 


Weighted 


0.016 


0.034 


-0.160 


0.164 








Approximated 


-0.002 


0.028 


-0.066 


0.095 










(88%) 


(18%) 


(59%) 


(42%) 






4000 


Weighted 


0.006 


0.017 


-0.116 


0.116 








Approximated 


-0.001 


0.014 


-0.066 


0.069 










(83%) 


(18%) 


(43%) 


(41%) 



(3 is particularly evident for the case n — 250, T = 3 and 7 = 2, case in which (3 has a median 
bias of 0.015, whereas the weighted conditional estimator has a median bias of 0.196. For what 
concerns the efficiency, the gain of our estimator seems to increase with n and 7 and is more 
evident for T = 3 then for T — 7. For the case of n = 250, T — 3 and 7 = 0.5, for instance, 
the reduction of MAE is just of 2%, which increases to 58% for the case in which n — 4000, 
T = 3 and 7 = 2. In most of the cases considered in Table 6, the reduction of MAE is at least 
of 15%. 

As regards the parameter 7, the reduction of bias is particularly relevant when T and 7 
are large. For instance, with n = 250, T = 7 and 7 = 2, their estimator has a median bias 
of -0.195, whereas our estimator has a median bias of -0.072. Similarly, the efficiency of our 
estimator with respect to their estimator seems to increase with 7, whereas it has not a clear 
trend in n and T. For instance, with n — 250, T — 7 and 7 = 0.5, the reduction of MAE from 
their estimator to our estimator is of 5%, while it is equal to 41% for the case oi n — 4000, 
T = 7 and 7 = 2. In most of the cases considered in Table 6, the reduction of MAE is at least 
of 25% and is usually more evident than for the estimation of (3. 

The main explanation that we can give for the results above is that, as may also be deduced 
from Table 6, the actual sample size used in our approach is always much larger than that 
used in the approach of Honore & Kyriazidou (2000). This difference increases with 7 and T. 
For instance, with 7 = 0.5 and T = 3, the actual sample size used in our approach is about 
1.5 times that used in their approach. This ratio becomes equal to about 2.1 for 7 = 0.5 and 
T — 7 and to 2.2 for 7 = 2 and T — 7. Note however that the gain in median bias and MAE 
does not closely follows the gain in the actual sample size. Other factors have therefore to be 
taken into consideration which may affect the performance of the two estimators in a way that 
depends on 7 and T. We recall, in particular, that the performance of our estimator depends 
on the quality of the approximation we are relying on, while the performance of the estimator 
of Honore & Kyriazidou (2000) depends also on the fact that the response configurations are 
differently weighted on the basis of the corresponding covariate configurations and that, for 
T > 3, they are indeed relying on a pairwise likelihood. 



6 Possible extensions 



In the following, we illustrate two possible extensions of the proposed approach to the case 
of dynamic logit models including more than one lagged response variables and to that of 
multinomial logit models for categorical response variables with more than two levels. In 
both cases, the approximate conditional inference outlined in the previous sections may be 
implemented with minor adjustments. 

6.1 More than one lagged response variables among the regressors 

Sometimes, it may be interesting to know how long is the dynamics of a certain phenomenon. 
In our context, to have the possibility to test for its length it is necessary to use a dynamic 
logit model with more than one lagged response variables. 

As an illustration consider the case of two lagged response variables. The model described 
in Section 2.1 becomes 

p(yit\oii, Xi, _i, . . . , yi,t-i) = p{yit\ai, xu, yi,t-2, yi,t-i) = 

^ exp[y^t(Q:» + </3 + y»,t-i7i + yi,t-2^2)] .^^ ^ ^ ^ ^ (21) 

with 7i and 72 having an obvious interpretation and yi-i and yio assumed to be exogenous. 
Along the same hues as in Section 2.1, it is straightforward to write the distribution of 

given ttj, Xi, |/j _i and t/io, as 

f I Y \ exp{yi+ai + J2tyitXuP + yixili + yix2l2) 

njl + exp(Q;i + + yi,t_i7i + ^^,4-272)] 

where t/ixi = Y^tVht-iyit and |/jx2 = Y.tyi,t-2yit- 

In this case, we can approximate the logarithm of the denominator with a first-order Taylor 
series expansion around ctj = 0, /3 = and 71 = 72 = obtaining 

^ log[l + exp(Q;j + a;-j/3 + yi^t-\li + yi,t-2l2)] ~ 
t 

^ ^[log(2) + O.Stti + 0.5a;^,/3] + 0.5 ^(yi,t-i7i + yi,t-2l2)- 
t t 

Therefore, by substituting the latter into (22) and after some algebra, we find thatp(t/j|Q;i, Xi, yi^. 



may be approximated with 



p*{yi\(^i,Xi,yi,-i,yio) 



cxp{yi+ai + Y.tyitx'it(3 - 0.5|/j^i7i - 0.5|/^^272 + Vixili + 1/^x272) 
exp(2;+Q;i + Ztx'^^fi - 0.52;*i7i - 0.52;*272 + ^xi7i + ^x272) ' 



where yi^u = Y^tVht-h and y^x/i = J2tyt-hyt, for /i = 1,2, and 2;*;, and 2;x/i defined in a 
similar way, with Z-i = fji-i and zq = i/io. The approximating model is therefore a quadratic 
exponential model in which the main effect parameter for yu is equal to ai + cc^^/S — 0. 571 — 0.672 
when t — 1, ... ,T — 2, to ai + x'^^fS — O.571 when t — T — 1 and to ctj + x\^(3 when t — T; 
moreover, the two-way interaction effect for {yis,yit) is equal to 71 when i = s + 1, to 72 
when t = s + 2 and to otherwise. The advantage of this model is that of having a minimal 
sufficient statistic for ctj which is again so that the conditional distribution of given 
Xi, yi-i, yio and yi+ does not depend on a,. The estimation of the structural parameters 
follows by maximizing a likelihood based on this conditional distribution in a way similar to 
that outlined in Section 4.1. In a similar way we can also compute standard errors for these 
estimates. 

In the case outlined above, it may interesting to test the hypothesis 72 = under which 
model (21) speciahzes into model (2). In the present approach, this hypothesis may be tested 
in the usual way by using the statistic 72/56(72), where 56(72) is the standard error for 72 
estimated as described in Section 4.1. Under the null hypothesis, this statistic should approx- 
imately have a standard Normal distribution. 

6.2 Categorical response variables 

Suppose that any response variable has M, instead of 2, possible levels, from to M — 1. The 
standard econometric model assumed in this case is the dynamic multinomial logit model 

p(yit\oii, Xi, yio, yi,t-i) = p(yit\oii, Xu, yi,t-i) = 



where = for any i, /3o = and 7/^^ = whenever /i = or m = 0. It is now convenient 
to use a dummy representation for the response variables yn and so let an be an (M — 1)- 
dimensional vector with all elements equal to 0, apart from the element aitm, ^ = yu — 



I = 



l,...,n, t = l,...,T, 



equal to 1 when i/it > 0. Thus 

P\yit\(^i-i '^it-i yi,t—\) — v-v 7F=; T j 7 7";^ i 7 \ ' — ' ' 

where the sums and are extended to 1, . . . , M — 1 and ht is an [M — l)-dimensional 
binary vector with elements htm- This vector has M possible configurations, corresponding to 
the possible configurations of any a^. Then, the conditional distribution of t/j, given Q;i,Xj 
and yjO) is equal to 

V\yi\<^ii-^iiyKi) — T-r 7?^ T T , 7 7) K^^) 

m + l^h Z^m '^i,t-l,hOtmlhm,j 

with ai^m — Ojt^ and aixhm — X^t 0'i,t-i,hO'itm- 

Proceeding along the same lines as in Section 3.1, we have to approximate the logarithm 
of the denominator of (23) through a first-order Taylor expansion around ctj = 0, /3 = and 
7 = 0. We have that 



XI ^"^SE ®^P(E ^t^^i^ + E ^tmXuf^m + E E ^ht-hhWrnlhrn)] ~ 
t Ij^ m m h m 

~ E [^Og(^) + ]^ E("^"^ + X'uf^m)] +J^Y1 ««"^7m+, 
t m m 

with Oj*^ = 0'i,t-i,m s-nd 7^+ defined in an obvious way. Thus the approximating model is 
P*iyi\ai,Xi,yio) = 

Eb exp(Em ^'+maim + Et Em ^trnJC^^/?^ + J^fe Em bxhmlhm - Em Kmlm+/M) ' 

where the sum at the denominator is extended to all the possible configurations of the binary 
matrix B — {bi ■ ■ ■ bx) and 6+^, 6xfem and 6*^ a-re defined in an obvious way. 

It may be easily realized that aij^m are sufficient statistics for the incidental parameters aim 
(i = 1, . . . , n, m = 1, . . . , M — 1) and so, as usual, we can rely on the conditional distribution 

exp(Ef Em (^itmX'itfi^ + Efe Em ajxhrnlhrn - Em Q»m7m+/M) 
Es exp(^j hmX'itf^m + Eft Em bxhmlm+ " Em Km7m+/M) ' 

to estimate the structural parameters, where the sum Eb extended to al the matrices B 
such that b+m — di+m, m = 1, . . . , M — 1. 



7 Conclusions 



We proposed an estimation approach for dynamic logit models for binary panel data allowing 
for unobserved heterogeneity and lagged response variable beyond strictly exogenous covari- 
ates. The approach is based on approximating the assumed logit model with a quadratic 
exponential model (Cox, 1972). On the basis of the latter we construct an approximate 
conditional likelihood which does not depend on the heterogeneity parameters, which are con- 
sidered as incidental parameters. By maximizing this likelihood, we obtain an approximate 
conditional estimator for the other parameters of the logit model, i.e. the parameters for the 
covariates and that for the state dependence, which are referred to as structural parameters. 
We also show how this estimator may be improved by using a more precise approximation of 
the assumed logit model. The resulting estimator is the one we suggest to use in practical 
applications. 

The main feature of the estimator above is that it is simpler to use and performs better 
than other conditional estimators existing in the literature. In particular, with respect to the 
weighted conditional estimator of Honore & Kyriazidou (2000), that we consider a benchmark 
estimator in this literature, our estimator does not require a kernel function for weighting the 
response configurations, may also be used when T ^ 2, instead of T ^ 3, and in the presence 
of time dummies, without requiring particular adjustments. A more important aspect is that, 
usually, our estimator also has a smaller bias and a greater efficiency. This conclusion is based 
on a simulation study that we performed along the same hues as Honore & Kyriazidou (2000). 
In particular, we noticed that our estimator has always a limited bias. It also has a root 
mean square error and a median absolute error that decrease, as n grows, at a rate close to 
^/n. Moreover, the advantage in terms of bias and efficiency over the estimator of Honore 
& Kyriazidou (2000) is more consistent when there is a strong state dependence effect. An 
intuitive explanation of the better performance of our estimator over their estimator is that the 
first is based on a conditional likelihood to which a larger number of response configurations 
contribute (actual sample size) with respect to the likelihood on which the other estimator is 
based. The larger actual sample size more than compensate the fact that we are relying on 



an approximate conditional likelihood. 

In our approach, we also show how it is possible to estimate standard errors for the proposed 
estimator. These standard errors are estimated in the usual way on the basis of an information 
matrix which is obtained as a by-product from the estimation algorithm. On the basis of these 
standard errors we can construct confidence intervals for the structural parameters. As our 
simulation study shows, these confidence intervals usually have an actual coverage level very 
close to the nominal one and so we conclude that the suggested method for estimating the 
standard errors is adequate in practical applications. For this reason, we had not the exigence 
to develop more sophisticated methods, based for instance on a bootstrap procedure, for 
estimating the standard errors. 

In the present paper, we also outlined the extension of the approach to more complex 
structures for the state dependence, based on more than one lagged response variables among 
the regressors, and to that of dynamic multinomial logit models for categorical response vari- 
ables having more than two categories. We reserve the development of both of them and the 
assessment of the quality of the inference produced in these cases to future research. 

Appendix 

Proof of Theorem 1. First of all consider that, under the quadratic exponential model 
(7), we can express the conditional distribution of any t/j, given ai, Xi and |/io, as 



with r]it{yi,t-i,yit) = Sitiyu) (iw{yi,t^iyitl) and 5it{yit) = exp{yitai + yux'^^/S - 0.5yia) iit <T 
and dit{yit) = exp{yitai + yuX^^P) iit = T. Therefore, by marginalizing with respect to any 
response variable in backward order (from t — T), we obtain 



where, since r)it{yi,t-i,0) is always equal to 1, the function git{yi^t-i) is defined recursively as 




p*{yi,...,yit\ai,Xi,yio) 





We therefore have that 

iy"ijjn- ■ ■ ■ ■ UuW,- X,. tjat) _ [Yl,<^iilis(jJ,..-i- iji.)] (Ji.i+iilJit) _ ^ y •^ yi-j+iijJtt) 

P*{yii,---,yi,t-i\oii,Xi,yio) [ll^^t_ir]is{yi,s-i,yis)] 9it{yi,t~i) ' » git{yi,t-iy 

which does not depend on 7/jo, • • • , yi,t-2 and so yn is conditional independent on these variables 

given aj, Xj, and yi^t-i- From this conditional probability, expression (8) directly follows. 

Finally, on the basis of a Taylor series expansion around ctj = 0, /3 = and 7 = 0, we 

obtain 

log[^ir(yi,T-i)] ~ log(2) + 0.5(0;^ + x'iT(3 + yi,T-il) 

and then 

9iT{yi,T-i) ~ 2exp[0.5(Q;j + Xij,(3)] exp(0.5|/i,T-i7) = exp(Qr) exp(0.5|/i,T-i7), 

with CiT denoting a constant term with respect to ?/i,T-i- By substituting the latter in 

gi,T-i{yi,T-2) and following the same recursion above with the Taylor approximation used 
at any iteration, we obtain 

git{yi,t-i) ~ exp(Qt) exp(0.5y^,t_i7), t = 1, . . . , T. 

The approximation log[git{l) / git{0)] ~ O.57 then follows. 

Proof of Theorem 2: Let Qn{d) = t{e)/n and go(^) = -Eo{log[p*(2/|a, ^, ?/o, ?/+)]}• We 
first prove existence and consistency of and then asymptotic normality. 

• {Existence and consistency) Under our assumptions, conditions (i), (ii) and (iii) of Theo- 
rem 2.7 of Newey & McFadden (1994) are satisfied and then, since 0„ = argmax0Q„(0), 

we have that 6^ exists with probability 1 as n ^ 00 and 9^ Oq. In particular: 

(i) Qq{6) is uniquely maximized at 0q. Using a notation derived from Section 4.1, let 
u{D, 7/0, y) = (St>i ytd't, — 0.5y* -l-yx)'- The first derivative of Qo{0) at 0q may be 
then expressed as 

V^gol^o) = Eo{u{D, yo, y) - Eo[u{D, yo, y)\D, yo, y+]} = 0. (24) 
Moreover, the second derivative may be expressed as 

"^eeQoieo) = -Eo[S{D, I/O, 2/+)], (25) 



where S{D,yo,y+) = Vo[u{D,yo,y)\D,yo,y+], with Vo{-) denoting the variance- 
covariance operator under the true model. Note however that u{D, yo, y) may also 
be expressed as A{D)w{yo,y), with 



and y_^ denoting the reduced vector y without the first element. Therefore, (25) 
may also be expressed as —EQ{A{D)Vo[w{yQ,y)\D,yQ,y^]A{Dy}, which exists 
and is negative definite provided that Eq{DD') exists and is of full rank. This 
is because Vo[iy(?/o, yo, is positive definite for any y^ and D and any y+ 

between and T, but the probability that < < T is always positive. 

(ii) 00 is an element of the interior of a convex set and Qn{0) is concave. That Oq 
is an interior point of obvious since = M'^"'"^. The concavity of Qn{0) directly 
derives from the concavity of i*{0) discussed at the end of Section 4.1. 

(iii) Qn{0) Qo{0) for any 6 0. Since Qn{0) is the sample mean of random 
variables, each with the same expected value equal to Qo{0), this easily follows 
from the law of large number. Note, in particular, that this law may be applied 
since Qo{6) exists for any 6 which, in turns, directly derives from the existence of 
EQ[u{D,yo,y+)] ensured by that of Eq{DD'). 

• {Normality) It follows form Theorem 3.1 of Newey & McFadden (1994). In particular, 
the following conditions of this Theorem hold: 

(i) dn Oq and do belongs to the interior of® (see the proof above). 

(ii) Qn{G) is twice continuously differentiable in a neighborhood M of Oq. This deriva- 
tive is equal to minus the information matrix (17) divided by n which is clearly 
continuous in any H. 



(iii) y/nVeQniOo) ^ A^(0,S). Firstof all we have that, because of (24), £;o[V0Qn(0o)] = 
0. This implies that Vq[V QQn{do)] = ^o{Vg,g„(0o) Vg,g„(6>o)'}. The latter may 




however be expressed as 



Eo{Eo[VeQni0o)'^eQni0oy\D,yo,y+]} = Eo{Vo[uiD,yo,y)\D,yo,y+]}, 



which, in turn, is equal the S — —'V qqQo{6o) which exists and is positive definite. 
The convergence to the Normal distribution therefore follows from the Central Limit 
Theorem. 

(iv) supQ^j^ W^eoQniG) + S|| ^ 0. This directly follows from Lemma 2.4 of Newey & 
McFaddcn (1994) and the fact that Eo[V00g„(6>o)] = -S and that Eo[\\V00Qn{d)\\] 
is finite for any 6 e J\f. 

(v) S is nonsingular. See item (iii) above. 
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